HW3-drafting-viz

Author

Natalie Smith

Published

February 23, 2025

1 Question 1:

  1. Which option do you plan to pursue? It’s okay if this has changed since HW #1.

    I’m still planning on pursuing the infographic.

2 Question 2:

  1. Restate your question(s). Has this changed at all since HW #1? If yes, how so?

    My questions have shifted slightly since Homework 1 -

    • How have major air pollutants trended over the last 10–20 years in Los Angeles, CA?

    • Which pollutants are the biggest contributors to poor AQI?

      • It’s PM 2.5
    • Which areas of Los Angeles are most affected by PM2.5?

    • What are the sources of PM2.5?

3 Question 3:

  1. Explain which variables from your data set(s) you will use to answer your question(s), and how.

    • To see how air pollution as trended over time, I’ve used AQI Summary data from the EPA. I was able to calculate the median AQI per year from 2000-2024, giving me two variables, AQI (number) and year (number). Using the same AQI data, I was able to see the main pollutants contributing to the AQI by grouping by year and pollutant to see the top pollutants per year. I was then able to average those counts to get the top pollutant count for 2000-2024, giving me two variables, main pollutant (chr) and average count (number).

    • To see which areas are most affected by PM 2.5 in Los Angeles, I used CalEnviroscreen data. After tidying the spatial data to show only the incorporated regions of Los Angeles County, I was able to plot PM 2.5 percentile across the county. The variables I’ve filtered and selected for in the Enviroscreen data include PM 2.5 percentile (num), tract (num), zip (num), geometry (sfc). I kept zip code for quick area reference.

    • To see the sources of PM 2.5, I tidied and wrangled two datasets, Point Sources and Nonpoint Sources, from the EPA’s National Emissions Inventory (NEI) for the most recent year available (2020). After joining the datasets, I was left with three variables: pollution source (chr), source type (chr), and total emissions (num).

4 Question 4:

  1. Find at least two data visualizations that you could (potentially) borrow / adapt pieces from. Link to them or download and embed them into your .qmd file, and explain which elements you might borrow (e.g. the graphic form, legend design, layout, etc.).

Loving this infographic by Neel Dhanesha. I’ve been struggling to figure out to put my final viz together. I like how this infographic is very data driven but has many strategic annotaions and well as short paragraphs to explain each graph and move the story forward. I may try a similar spacing, and longer captions.

Giorgia Lupi was one of my favorite finds while exploring other visualizations - she often mixes photography, illustration, and digital elements into her visualization, which i find so beautiful…almost like mixed media art. I dont think I will be making any graphs with photos like this one, but I did find a photo of Los Angeles to base my color scheme off of, and I’d like to incorporate some hand drawn elements into my infographic.

5 Question 5:

  1. Hand-draw your anticipated visualizations, then take a photo of your drawing(s) and embed it in your rendered .qmd file – note that these are not exploratory visualizations, but rather your plan for your final visualizations that you will eventually polish and submit with HW #4. You should have a sketch of your infographic (which should include at least three component visualizations) if you are pursuing option 1.

idea 1 - using circles to show main pollutant idea 2 - using a donut to show main pollutant

Here are two options I’m playing around with. Idea 1 on the left includes the main pollutants as bubbles and Idea 2 on the right shows the main pollutants in a donut chart. Additionally, I’m playing around with illustrated point sources vs a graph.

6 Question 6:

  1. Mock up all of your hand drawn visualizations using code. We understand that you will continue to iterate on these into HW #4 (particularly after receiving feedback), but by the end of HW #3, you should:

    • have your data plotted (if you’re experimenting with a graphic form(s) that was not explicitly covered in class, we understand that this may take some more time to build; you should have as much put together as possible)

    • use appropriate strategies to highlight / focus attention on a clear message

    • include appropriate text such as titles, captions, axis labels

    • experiment with colors and typefaces / fonts

    • create a presentable / aesthetically-pleasing theme (e.g. (re)move gridlines / legends as appropriate, adjust font sizes, etc.) Please note for all of the visualizations I will most likely not be using these titles, subtitles, annotations. I plan on pulling them into Affinity and playing with text there.

6.1 Setup

#libraries: 
library(tidyverse)   # dplyr, purrr, etc.
library(janitor)     # for clean_names()
library(lubridate)   # for mdy(), etc
library(here)        # for here()
library(doBy)        # for summaryBy()
library(scales)
library(showtext) # allows us to import google fonts 
library(glue)
library(ggtext)
library(sf)
library(here)

#......................import Google fonts.......................
# `name` is the name of the font as it appears in Google Fonts
# `family` is the user-specified id that you'll use to apply a font in your ggpplot
font_add_google(name = "Montserrat", family = "mont")
font_add_google(name = "Open Sans", family = "open_sans")



#now we need to 'turn show text on'
#......enable {showtext} rendering for all newly opened GDs......
showtext_auto()


smog_pal <- c("#79AAB6", 
              "#B0C7C1", 
              "#E3AC79", 
              "#CC7E62",
              "#B77B70")


smog_pal2 <- c("#79AAB6", 
              "#B0C7C1", 
              "#DFAF75", 
              "#DE8635",
              "#DF674F")

custom_smog_pal <- c(
 "Ozone"= "#79AAB6",
 "PM10" = "#B0C7C1", 
 "CO" =   "#DFAF75",  
 "NO2" =  "#DE8635",    
 "PM2.5"= "#DF674F"      
)

smog_sub_pal <- smog_pal[c(4,5)]

6.2 Air Quaility Over Time in Los Angeles

6.2.1 Median AQI - import, tidy, and wrangle

# make a loop to pull in multiple years worth of data files and tidy

# data ranges from 2000 - 2024
years <- 2000:2024

# -------- S E T  U P  L O O P  --------
median_aqi_list <- map(years, ~ {
  
      # build the file path
  aqi_file_path <- here("data/aqi_daily", paste0("aqidaily_", .x, ".csv"))
  
# -------- R E A D  I N  D A T A --------
  read_csv(aqi_file_path) %>%
    clean_names() %>%
    
# -------- T I D Y--------
      #transform date from a character to a date
    mutate(date = mdy(date)) %>%
    
      #extract year
    mutate(year = year(date)) %>% 

      #calculate median AQI
    group_by(year) %>%
    summarize(
      median_aqi = median(overall_aqi_value, na.rm = TRUE),
      .groups = "drop")
    
})

# combine into one data frame
median_aqi_df <- bind_rows(median_aqi_list)

6.2.2 Viz

#subtitle: 
subtitle <- glue::glue("
    Tracking median Air Quality Index (AQI) over time:<br>
    <span style='color:#88B589;'>**Good**</span>, 
    <span style='color:#F3D03E;'>**Moderate**</span>, 
    <span style='color:#E67E52;'>**Unhealthy for Sensitive Groups**</span>"
)

aqi_plot <- ggplot(median_aqi_df, aes(x = year, y = median_aqi)) +
  
  # ADD COLOR TO AXIS LINES
  geom_segment(aes(x = 1999.65, 
                   xend = 1999.65, 
                   y = 50, yend = 60), 
                   color = "#88B589", 
                   linewidth = 2) +
  geom_segment(aes(x = 1999.65, 
                   xend = 1999.65, 
                   y = 60, yend = 100), 
                   color = "#F3D03E", 
                   linewidth = 2) +
  geom_segment(aes(x = 1999.65, 
                   xend = 1999.65,
                   y = 100, 
                   yend = 105), 
                  color = "#E67E52", 
                 linewidth = 2) +
  
# add lines to show where each threshold starts  
  # med - high
    geom_segment(aes(x = 1999.65, 
                     xend = 2024,
                     y = 100, 
                     yend = 100),
                     color = "grey90",
                     linewidth = 0.5,
                     linetype = "dashed") +
  
    # low - med
    geom_segment(aes(x = 1999.65, 
                     xend = 2024,
                     y = 60, 
                     yend = 60),
                     color = "grey90",
                     linewidth = 0.5, 
                     linetype = "dashed") +
  
  
  geom_line() + 
  geom_point() +
  
# ADD ANNOTATIONS  
  # Low Emission Standards 2004
  annotate(
    geom = "text",
    x = 2000.6,
    y = 80,
    label = "LEV II Vehicle\nStandards Take Effect",
    size = 3,
    color = "black",
    hjust = "inward"
  ) +
  # add arrow
  annotate(
    geom = "segment",
    x = 2004, xend = 2004,
    y = 89, yend = 81,
  ) +  
  
  # Advanced Clean Cars Program & Cap-and-Trade Begin 2012
  annotate(
    geom = "text",
    x = 2015.65,
    y = 70.75,
    label = "Advanced Clean\nCars Program Begins",
    size = 3,
    color = "black",
    hjust = 1  # Left-aligned text
  ) +
  # add arrow
annotate(
    geom = "segment",  
    x = 2012, xend = 2012, 
    y = 73, yend = 83,  
  ) +
  # Bobcat Fire 2020
  annotate(
    geom = "text",
    x = 2022.21,
    y = 93.5,
    label = "Bobcat Fire",
    size = 3,
    color = "black",
    hjust = "inward"
  ) +
  # add arrow
  annotate(
    geom = "segment",
    x = 2020, xend = 2020,
    y = 91.5, yend = 86.65,
  ) +
  

# BACK TO PLOTTING  
  scale_y_continuous(
    limits = c(50, 105),
    breaks = seq(50, 100, by = 10)
  ) +
  
  scale_x_continuous(
    limits = c(1999, 2024),
    breaks = c(seq(2000, 2020, by = 5), 2024),
  ) +
  
# TITLES, ETC.  
labs(
  title = "Los Angeles Air Quality Trends (2000–2024)",
  subtitle = subtitle,
  caption = "Data source: EPA (2025)"

    ) +
  
  theme_minimal() +
  
# CUSTOMIZE THEME
  theme(
    plot.title.position = "plot", # shift title to the left
    plot.title = element_text(family = "mont",
                            face = "bold",
                            size = 18,
                            color = "black"),  # Fixed here by closing the parenthesis
    plot.subtitle = ggtext::element_textbox(family = "open_sans",
                             size = 11.5,
                             color = "black",
                             margin = margin(t = 2, r = 0, b = 6, l = 0)),  # move in clockwise to top, right, bottom, left
  plot.caption = ggtext::element_textbox(family = "open_sans",
                             face = "italic",
                             color = "black",
                             margin = margin(t = 15, r = 0, b = 0, l = 0)),
  
    # panel.grid.major.y = element_line(color = "gray90"),
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    
    axis.line.y = element_blank(),
    axis.text.y = element_blank(), # remove y axis text
    axis.title = element_blank(),   # Remove axis titles
    
    plot.margin = margin(t = 10, r = 10, b = 10, l = 10)
  
  )

aqi_plot

6.3 Main Pollutants in the AQI

6.3.1 Import Data, Tidy, & Wrangle

# data ranges from 2000 - 2024
years <- 2000:2024

# -------- S E T  U P  L O O P  --------
common_pollutant_list <- map(years, ~ {
  
  # file path
  aqi_file_path <- here("data/aqi_daily", paste0("aqidaily_", .x, ".csv"))
  
  # -------- R E A D  I N  D A T A --------
  read_csv(aqi_file_path) %>%
    clean_names() %>%
    
    # character to date
    mutate(date = mdy(date)) %>%
    
    # extract year
    mutate(year = year(date)) %>%
    
    select(year, main_pollutant)

})

# Combine all common pollutant calculations into one data frame
common_pollutant_all <- bind_rows(common_pollutant_list)

# ---------- SUMMARIZE BY YEAR ---------

# count occurrences of pollutants for the year
top_pollutants <- common_pollutant_all %>% 
  group_by(year, main_pollutant) %>%
  summarize(count = n(), .groups = "drop") %>% 

  group_by(main_pollutant) %>%
  summarize(avg_count = mean(count), .groups = "drop")

6.3.2 Viz - Choice 1 - Circles:

library(packcircles)

# -------- STEP 1:Compute circle layout----------

# circleProgressiveLayoutcalculates the positions and sizes of circles to create a non-overlapping
circle_layout <- circleProgressiveLayout(top_pollutants$avg_count, sizetype = "area")

# Add a small gap between circles
circle_layout$radius <- circle_layout$radius * 0.95

# Add coordinates back to the pollutant data
circle_data <- cbind(top_pollutants, circle_layout)

# create vertices for circle polygons
circle_vertices <- circleLayoutVertices(circle_layout, npoints = 50)

# -------- STEP 2: ADD NESTED CIRCLES ----------
# Generate radius, x, and y coordinates for nested circles
circle_layout_border <- circle_layout
circle_layout_border$radius <- circle_layout$radius * 0.95

# Create vertices for nested circles
circle_vertices_border <- circleLayoutVertices(circle_layout_border, npoints = 50)

# # -------- STEP 3: JOIN TABLES TO ADD LABELS ----------
# 
# Combine the pollutant data with circles (center coordinates and radius)
circle_labels <- cbind(top_pollutants, circle_layout)
# -------- BASE PLOT(s) ----------

# Create the base plots
gg_circles <- ggplot() +
 # Draw the main circles
 geom_polygon(data = circle_vertices, 
              aes(x = x, y = y, group = id, fill = as.factor(id)), 
              color = "grey90",          # add white borders
              size = 0.2,              # thinner border 
              alpha = 0.9) +           
 theme_void() +     
 coord_equal()
# coord_equal(expand = TRUE, xlim = c(-14.75, 14.75), ylim = c(-14.75, 14.75))  # size of cirlces                     # ensure circles are perfectly round
     

# Add the nested circles layer
gg_circles_nested <- gg_circles +

 geom_polygon(data = circle_vertices_border,
              aes(x = x, y = y, group = id, fill = as.factor(id)),
              color = "grey90",        # match border color of main circles
              size = 0.2,           
              alpha = 0.5)            # more transparent for nested effect
main_pollutant_circles <- gg_circles_nested +
  # Add text labels inside the circles
  geom_text(data = circle_labels,
            aes(x = x, y = y, 
                size = avg_count,      # text size varies with pollution count
                label = main_pollutant), 
            fontface = "bold",
            family = "mont",          
            color = "#2F2F2F") +   
  scale_fill_manual(values = smog_pal2) +  # custom color palette
  scale_size_continuous(range = c(2, 7)) + # set text size range (small --> large)
  
  theme_void() +  

  # LABS ---  
  labs(
    title = "Pollutant Distribution",  # Main chart title
    subtitle = "Median occurrence of dominant pollutants from 2000 to 2024",  # Subtitle with date range
    caption = "Data: EPA, 2025"  # Data source
  ) +  

  # CUSTOMIZE THEME ---
  theme(
    legend.position = "none",  
    plot.title.position = "plot", 
    plot.caption.position = "plot",  
    
    # title
    plot.title = element_text(family = "mont",  # Use "mont" font
                              face = "bold",  # Make it bold
                              size = 18,  # Increase font size
                              color = "black",  
                              hjust = 0,  # Align to the far left
                              margin = margin(b = 5)),  # Add spacing below title
    
    # subtitle
    plot.subtitle = element_text(family = "open_sans",  
                                 size = 11.5,  
                                 color = "black",
                                 hjust = 0,  
                                 margin = margin(b = 10)), 
    
    # caption
    plot.caption = element_text(family = "open_sans",
                                face = "italic",
                                color = "black",
                                hjust = 1,  
                                margin = margin(t = 15)),  # Add spacing above caption
    
    # overall plot margins
    plot.margin = margin(t = 20, r = 20, b = 20, l = 20)  # Ensure spacing from edges
  )

main_pollutant_circles

6.3.3 Viz - Choice 2 - Donuts:

# Prepare the data for the donut chart using top_pollutants
donut_data <- top_pollutants %>% 
  # Calculate the fraction of each pollutant and make new columns
  mutate(
    fraction = avg_count / sum(avg_count),
    percent  = round(fraction * 100), # caluclaute the percent from the fration
    # Create a label that shows the pollutant and its percentage
    label    = paste0(main_pollutant, "\n", percent, "%") # reminder: "\n" give you a space
  ) %>% 
  # Arrange in descending order of main pollutant
  arrange(desc(main_pollutant)) %>% 
  
  # Compute cumulative values for the donut slices
  mutate(
    # cumulative sum (cumsum) gives us the upper boundary 
    ymax          = cumsum(fraction), 
   #lag() function shifts the values of ymax down by one row
   # For the very first row, there is no previous value, so default = 0 is used
    ymin          = lag(ymax, default = 0), 
   # if i want to place the labels at the midpoint of the slice - optional
   # calculates the midpoint of each slice by averaging the ymin and ymax.
    labelPosition = (ymax + ymin) / 2
  )
# Create the donut chart using ggplot2
donut_box <- ggplot(donut_data, 
                aes(ymax = ymax,  # Upper limit of each slice
                    ymin = ymin,  # Lower limit 
                    xmax = 4,  # Outer radius of the donut
                    xmin = 3,  # Inner radius 
                    fill = main_pollutant)) + 
  
  # Draw the slices as rectangles
  geom_rect(color = "grey98") +  # Add light grey borders between slices
  
  # Add labels inside each slice
  geom_label(aes(x = 3.5,  # Position labels at the middle of the donut ring
                 y = labelPosition,  # Y-axis position of labels
                 label = label),  # Text to display
             color = "black", size = 3) +  # Label color and font size
  
  # Convert the bar chart into a circular donut chart
  coord_polar(theta = "y") +  

  # Control the thickness of the donut by setting x-axis limits
  xlim(c(2, 4)) +  

  # Apply a custom color palette for pollutants
  scale_fill_manual(values = smog_pal2) +  
  
  # LABS ---  
  labs(
    title = "Pollutant Distribution",  # Main chart title
    subtitle = "Median occurrence of dominant pollutants from 2000 to 2024",  # Subtitle with date range
    caption = "Data: EPA, 2025"  # Data source
  ) +  

  # Remove background, grid, and axes for a clean look
  theme_void() +  

# CUSTOMIZE THEME ---
  theme(
    legend.position = "none",  # Remove the legend
    plot.title.position = "plot",  # Position title relative to the entire plot
    plot.caption.position = "plot",  # Position caption relative to the entire plot
    
    # title
    plot.title = element_text(family = "mont",  # Use "mont" font
                              face = "bold",  # Make it bold
                              size = 18,  # Increase font size
                              color = "black",  
                              hjust = 0,  # Align to the far left
                              margin = margin(b = 5)),  # Add spacing below title
    
    # subtitle
    plot.subtitle = element_text(family = "open_sans",  
                                 size = 11.5,  
                                 color = "black",
                                 hjust = 0,  # Align to the far left
                                 margin = margin(b = 10)),  # Increase space below subtitle
    
    # caption
    plot.caption = element_text(family = "open_sans",
                                face = "italic",
                                color = "black",
                                hjust = 1,  # Align to the far right
                                margin = margin(t = 15)),  # Add spacing above caption
    
    # overall plot margins
    plot.margin = margin(t = 10, r = 20, b = 10, l = 10)  # Ensure spacing from edges
  )


donut_box

# # Save 
# ggsave("donut_box.pdf",  # filename
#        donut, # plot to save
#        width = 10,  
#        height = 6)

6.4 Sources of PM 2.5

6.4.1 Point Sources - Import, Tidy, & Wrangle

# --- POINT SOURCES -----

# bring in data and clean names
point_sources <- read.csv(here("data/sources/facility_point_sources_2020.csv")) %>% 
  clean_names()

# filter for 2.5 and make emissions numeric
point_sources <- point_sources %>% 
  filter(pollutant %in% c("PM2.5 Primary (Filt + Cond)", "PM2.5 Filterable")) %>%
  mutate(emissions_tons = as.numeric(emissions_tons))

# TOP EMITTERS ---

# Top 50 emitters
top_50_point_sources <- point_sources %>% 
  slice_max(order_by = emissions_tons, n = 50)


# Major emitters (100+ tons)
major_point_sources <- point_sources %>% 
  filter(emissions_tons >= 100)

6.4.2 Nonpoint Sources - Import, Tidy, & Wrangle

# --- NONPOINT SOURCES -----

# bring in data and clean names

nonpoint_sources <- read.csv(here("data/sources/nonpoint_sources_2020.csv")) %>% 
  clean_names()


nonpoint_sources <- nonpoint_sources %>% 
  filter(pollutant %in% c("PM2.5 Primary (Filt + Cond)", "PM2.5 Filterable")) %>% 
  mutate(emissions_tons = as.numeric(emissions_tons)) %>% 
  drop_na()

# TOP EMITTERS ---

# Top 50 emitters for nonpoint sources
top_50_nonpoint_sources <- nonpoint_sources %>% 
   slice_max(order_by = emissions_tons, n = 50)

# Major emitters for nonpoint sources
major_nonpoint_sources <- nonpoint_sources %>% 
  filter(emissions_tons >= 100)

6.4.3 Combine Sources:

# COMBINE SOURCES: 
combined_sources <- bind_rows(
  nonpoint_sources %>% 
    mutate(source_type = "nonpoint",
           # Use scc_level_2 as the pollution source name
           pollution_source = eis_sector,
           # Add empty columns for point-source-specific fields
           site_name = NA,
           eis_facility_id = NA,
           facility_type = NA,
           street_address = NA,
           naics = NA,
           lat_lon = NA,
           longitude = NA,
           latitude = NA),
  
  point_sources %>% 
    mutate(source_type = "point",
           # Use facility_type as the pollution source name
           pollution_source = facility_type,
           # Add empty columns for nonpoint-source-specific fields
           scc_code = NA,
           eis_sector = NA,
           source_description = NA,
           scc_level_1 = NA,
           scc_level_2 = NA,
           scc_level_3 = NA,
           scc_level_4 = NA)
)

# WRANGLE ---

combined_sources <- combined_sources %>% 
  select(state_county, 
         pollutant,
         emissions_tons,
         source_type,
         pollution_source)

# TOP EMITTERS ----

# Top 50 emitters for all sources
top_50_sources <- combined_sources %>% 
   slice_max(order_by = emissions_tons, n = 50)

# Major emitters for all sources (greater than 100 tons)
major_sources <- combined_sources %>% 
  filter(emissions_tons >= 100)

# top 10 
top_10_sources <- top_50_sources %>% 
  group_by(pollution_source, source_type) %>% 
  summarise(total_emissions = sum(emissions_tons, na.rm = TRUE)) %>% 
  ungroup() %>% 
  slice_max(order_by = total_emissions, n = 10) %>% 
  mutate(pollution_source = recode(pollution_source, # rename variables
  "Industrial Processes - Not Elsewhere Classified" = "Other Industrial Processes",
  "Fuel Comb - Residential - Wood" = "Residential Wood Combustion",
  "Mobile - On-Road non-Diesel Light Duty Vehicles" = "Light Duty Vehicles",
  "Dust - Construction Dust" = "Construction Dust",
  "Waste Disposal" = "Waste Disposal",
  "Petroleum Refinery" = "Petroleum Refineries",
  "Fires - Wildfires" = "Wildfires",
  "Fuel Comb - Residential - Natural Gas" = "Residential Natural Gas Combustion",
  "Miscellaneous Non-Industrial Not Elsewhere Classified" = "Other Non-Industrial",
  "Dust - Unpaved Road Dust" = "Unpaved Road Dust"
))

6.4.4 Viz:

#subtitle: 
subtitle <- glue::glue("
    Analyzing Major Emission Contributors Across  <span style='color:#B77B70;'>**Point**</span> and <span style='color:#CC7E62;'>**Non-Point Sources**</span> in Los Angeles"
)

ggplot(top_10_sources, 
       aes(x = fct_reorder(pollution_source, total_emissions),  
           y = total_emissions,  
           fill = source_type)) + 
       

  geom_col() +  
  
  # Adding a horizontal line at y = 100 to highlight major source threshold
  geom_hline(yintercept = 100, linetype = "dashed", color = "black") +  # Horizontal line at y = 100
  
  # # rectangle -- i dont think I need this
  #   annotate(
  #   geom = "rect",
  #   xmin = 7.25, xmax = 9.75,
  #   ymin = 500, ymax = 1125,
  #   alpha = 0.5,
  #   fill = "gray70", color = "black"
  # ) +
  #text
annotate(
  geom = "text",
  x = 3,
  y = 1030,
  label = str_wrap("The EPA defines 100 tons as the threshold for a major source of pollution", width = 25),
  size = 3,
  color = "black",
  hjust = 0
) +
# arrow
  annotate(
    geom = "curve",
    x = 3, xend = 5,   # Closer to the line horizontally
    y = 1010, yend = 145,  # Lower the arrow to be closer to the threshold line
    curvature = -0.1,  # Slight curve
    arrow = arrow(length = unit(0.3, "cm"))
  ) +

  # Wrapping x-axis labels to 25 characters 
  scale_x_discrete(labels = function(x) str_wrap(x, width = 25)) +  
  
 
  coord_flip() +  
  
  scale_fill_manual(values = smog_sub_pal) + 
  
  # Format y-axis labels to include "tons"
  scale_y_continuous(labels = function(x) paste0(x, " tons")) +  # Adding "tons" to y-axis labels
  
# LABS---
  labs(
    title = "Top Sources of PM2.5 Pollution in 2020", 
    subtitle = subtitle,  
    caption = "Data source: EPA (2025)",  
    fill = "Source Type"  
  ) +
  
  theme_minimal() +
  
# THEME ---
  theme(
    axis.title = element_blank(),  # Removing axis titles (x and y)
    panel.grid.major.x = element_blank(),  
    panel.grid.minor.x = element_blank(),  
    panel.grid.minor.y = element_blank(),  
    panel.grid.major.y = element_blank(),  
    

    plot.title.position = "plot",  # Shifting the title to the left
    plot.title = element_text(family = "mont",  
                              face = "bold", 
                              size = 18,  
                              color = "black"), 
    
 
    plot.subtitle = ggtext::element_textbox(family = "open_sans",  
                                           size = 11.5,  
                                           color = "black", 
                                           margin = margin(t = 2, r = 0, b = 6, l = 0)),  
    

    plot.caption = ggtext::element_textbox(family = "open_sans",
                                           face = "italic",  
                                           color = "black",  
                                           margin = margin(t = 15, r = 0, b = 0, l = 0)), 
    
    # Adjusting plot margins for overall spacing
    plot.margin = margin(t = 10, r = 10, b = 10, l = 10), 
    
    # Customizing legend  - plot.legend isnt working
    legend.position = "none",  
)

6.5 Where is PM 2.5 pollution most prevalent?

6.5.1 Import, Tidy, & Wrangle

# --------------------Spatial Data-----------------------
# bring in enviroscreen shapefile
enviroscreen_sf <- read_sf(here("data/enviroscreen/enviroscreen_shapefiles/CES4_final_shapefile.shp")) %>% 
  clean_names() 


#-------- Tidy Data-------------

# Define excluded locations and identifiers
excluded_locations <- c("Santa Clarita", "Palmdale", "Lancaster", "Acton", 
                       "Agua Dulce", "Altadena", "Lake Los Angeles",
                       "Leona Valley", "La Crescenta-Montrose")

excluded_tracts <- c("6037911001", "6037910002", "6037403325", "6037104124",
                     "6037920326", "6037930301", "6037910709", "6037920303")

excluded_zips <- c("90265", "93535", "93552", "93532", "90704",
                   "91384", "91387", "91390", "93510", "93536", "91351",
                   "91011", "91355", "93551", "91342", "91381")

# Tidy data with simplified filtering
enviroscreen_sf <- enviroscreen_sf %>% 
  filter(county == "Los Angeles") %>% 
  select(tract, 
         zip,
         approx_loc,
         pm2_5_p,
         geometry,  
         county) %>% 
  filter(!approx_loc %in% excluded_locations) %>%
  filter(!tract %in% excluded_tracts) %>%
  filter(!zip %in% excluded_zips)

6.5.2 Viz:

# BASE MAP

base_map <- ggplot(enviroscreen_sf) +
  geom_sf(aes(fill = pm2_5_p, color = NULL), color = NA, linewidth = 0) + 
  theme_void()
pm_map <- base_map +
# adjust colors:
 scale_fill_gradientn(colors = smog_pal2,
                       labels = label_percent(scale = 1), # add percentage sign to each of our values
                       breaks = breaks_width(width = 10),
                       # values = scales::rescale(x = c(43,99)) 
) + 
    
  # update look of legend: 
  guides(fill = guide_colorbar(barwidth = 25, 
                               barheight = 0.75)) + #stretch out legend
  
labs(
  title = "Mapping PM2.5 Pollution in Los Angeles",
  subtitle = "Census tract percentiles relative to all California—<br>darker shades indicate higher pollution",
  caption = "Data: CalEnviroscreen, 2025"
) +

  # customize theme: 
theme(
  plot.title.position = "plot", # shift title to the left
  plot.title = element_text(family = "mont",
                            face = "bold",
                            size = 18,
                            color = "black"),  # Fixed here by closing the parenthesis
  plot.subtitle = ggtext::element_textbox(family = "open_sans",
                             size = 11.5,
                             color = "black",
                             margin = margin(t = 2, r = 0, b = 6, l = 0)),  # move in clockwise to top, right, bottom, left
  plot.caption = ggtext::element_textbox(family = "open_sans",
                             face = "italic",
                             color = "black",
                             margin = margin(t = 15, r = 0, b = 0, l = 0)),
  
  legend.position = "bottom",  # move legend to the bottom
  legend.title = element_blank(),  # no legend title
  
  # margins
  plot.margin = margin(t = 10, r = 10, b = 10, l = 10)
) 

pm_map

7 Question 7:

  1. Answer the following questions:

    • a. What challenges did you encounter or anticipate encountering as you continue to build / iterate on your visualizations in R? If you struggled with mocking up any of your three visualizations (from #6, above), describe those challenges here.

      • I’m struggling a bit to decide how I want to show the top pollutant in the AQI - I’m thinking either the donut or the bubbles ( i know neither of them are perfect), but I like the visual aspect of having circles juxtaposed with the rectangle of the line chart for the AQI and the bar chart for the pollutant sources.
      • I’d also like to have some identifiers on my map - maybe show where downtown is or hollywood, but I’m unsure which annotations would be effective and not clutter the map.
    • b. What ggplot extension tools / packages do you need to use to build your visualizations? Are there any that we haven’t covered in class that you’ll be learning how to use for your visualizations?

      • I’m going to use Affinity to build my infographic - I’m not sure if there’s anything we havent covered. Maybe more of the circular or polar graphs.
      • To be honest, I’m not sure what you mean by ggplot extension tools.
    • c. What feedback do you need from the instructional team and / or your peers to ensure that your intended message is clear?

      • Any thoughts on how to put everything together into a cohesive infographic/story. I want to kind of go from the big pictures to the more granular level - showing the general air quality over time, and then what pollutants contribute to it, and then where do those come from and who is affected?

      • Also any thoughts on how to best show the pollutants.